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Abstract. The following distributed coalescence protocol was introduced by Dahlia Malkhi in 2006 
motivated by applications in social networking. Initially there are n agents wishing to coalesce into 
one cluster via a decentralized stochastic process, where each round is as follows: Every cluster 
flips a fair coin to dictate whether it is to issue or accept requests in this round. Issuing a request 
amounts to contacting a cluster randomly chosen proportionally to its size. A cluster accepting 
requests is to select an incoming one uniformly (if there are such) and merge with that cluster. 
Empirical results by Fernandess and Malkhi suggested the protocol concludes in O(logn) rounds 
with high probability, whereas numerical estimates by Oded Schramm, based on an ingenious 
analytic approximation, suggested that the coalescence time should be super-logarithmic. 

Our contribution is a rigorous study of the stochastic coalescence process with two consequences. 
First, we confirm that the above process indeed requires super-logarithmic time w.h.p., where the 
inefficient rounds are due to oversized clusters that occasionally develop. Second, we remedy this 
by showing that a simple modification produces an essentially optimal distributed protocol: If 
clusters favor their smallest incoming merge request then the process does terminate in O(logn) 
rounds w.h.p., and simulations show that the new protocol readily outperforms the original one. 
Our upper bound hinges on a potential function involving the logarithm of the number of clusters 
and the cluster-susceptibility, carefully chosen to form a supermartingale. The analysis of the lower 
bound builds upon the novel approach of Schramm which may find additional applications: Rather 
than seeking a single parameter that controls the system behavior, instead one approximates the 
system by the Laplace transform of the entire cluster-size distribution. 



1. Introduction 

The following stochastic distributed coalescence protocol was proposed by Dahlia Malkhi in 2006, 
motivated by applications in social networking and the reliable formation of peer-to-peer networks 
(see [TU] for more on these applications). The objective is to coalesce n participating agents into a 
single hierarchal cluster reliably and efficiently. To do so without relying on a centralized authority, 
the protocol first identifies each agent as a cluster (a singleton), and then proceeds in rounds as 
follows: 

(1) Each cluster flips a fair coin to determine whether it will be issuing a merge-request or accepting 
requests in the upcoming round. 

(2) Issuing a request amounts to selecting another cluster randomly proportionally to its size. 

(3) Accepting requests amounts to choosing an incoming request (if there are any) uniformly at 
random and proceeding to merge with that cluster. 

In practice, each cluster is in fact a layered tree whose root is entrusted with running the protocol, 
e.g. each root decides whether to issue/accept requests in a given round etc. When attempting to 
merge with another cluster, the root of cluster Cj simply chooses a vertex v uniformly out of [n], 
which then propagates the request to its root. This therefore corresponds to choosing the cluster 
Cj proportionally to \Cj\. This part of the protocol is well-justified by the fact that agents within 
a cluster typically have no information on the structure of other clusters in the system. 
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A second feature of the protocol is the symmetry between the roles of issuing/accepting requests 
played by the clusters. Clearly, every protocol enjoying this feature would have (roughly) at most 
half of its clusters become acceptors in any given round, and as such could terminate within 0(log n) 
rounds. Furthermore, on an intuitive level, as long as all clusters are of roughly the same size (as is 
the case initially) , there are few "collisions" (multiple clusters issuing a request to the same cluster) 
each round and hence the effect of a round is similar to that of merging clusters according to a 
random perfect matching. As such, one might expect that the protocol should conclude with a 
roughly balanced binary tree in logarithmic time. 

Indeed, empirical evidence by Fernandess and Malkhi [TT] showed that this protocol seems highly 
efficient, typically taking a logarithmic number of rounds to coalesce. However, rigorous perfor- 
mance guarantees for the protocol were not available. 

While there are numerous examples of stochastic processes that have been successfully analyzed 
by means of identifying a single tractable parameter that controls their behavior, here it appears 
that the entire distribution of the cluster-sizes plays an essential role in the behavior of the system. 
Demonstrating this is the following example: Suppose that the cluster Ci has size n — o{^/n) while 
all others are singletons. In this case it is easy to see that with high probability all of the merge- 
requests will be issued to Ci , who will accept at most one of them (we say an event holds with high 
probability, or w.h.p. for brevity, if its probability tends to 1 as n — )• oo). Therefore, starting from 
this configuration, coalescence will take at least n^/'^~°^^^ rounds w.h.p., a polynomial slowdown. 
Of course, this scenario is extremely unlikely to arise when starting from n individual agents, yet 
possibly other mildly unbalanced configurations are likely to occur and slow the process down. 

In 2007, Oded Schramm proposed a novel approach to the problem, approximately reducing it to 
an analytic problem of determining the asymptotics of a recursively defined family of real functions. 
Via this approximation framework Schramm then gave numerical estimates suggesting that the 
running time of the stochastic coalescence protocol is w.h.p. super-logarithmic. Unfortunately, the 
analytical problem itself seemed highly nontrivial and overall no bounds for the process were known. 

1.1. New results. In this work we study the stochastic coalescence process with two main con- 
sequences. First, we provide a rigorous lower bound confirming that this process w.h.p. requires 
a super-logarithmic number of rounds to terminate. Second, we identify the vulnerability in the 
protocol, namely the choice of which merge-request a cluster should approve: While the original 
choice seems promising in order to maintain the balance between clusters, it turns out that typical 
deviations in cluster-sizes are likely to be amplified by this rule and lead to irreparably unbalanced 
configurations. On the other hand, we show that a simple modification of this rule to favor the 
smallest incoming request is already enough to guarantee coalescence in O(logn) rounds w.h.p. 
(Here and in what follows we let / < 5 denote that f = 0{g) while / x 51 is short for f ^ g ^ /■) 

Theorem 1. The uniform coalescence process lA coalesces in Tc{U) > log n • joj^^g „ rounds w.h.p. 
Consider a modified size-biased process S where every accepting cluster Ci has the following rule: 

• Ignore requests from clusters of size larger than |Cj|. 

• Among other requests (if any) select one issued by a cluster Cj of smallest size. 
Then the coalescence time of the size-biased process satisfies Tc{S) x logn w.h.p. 

Observe that the new protocol is easy to implement efficiently in practice as each root can keep 
track of the size of its cluster and can thus include it as part of the merge-request. 
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Figure 1. The left plot compares the running times for the two processes. Statistics are derived 
from 100 independent runs of each process, for each n £ {1024, 2048, . . . , 2^"}. The right plot tracks 
the ratio between the maximum and average cluster-sizes, through a single run of each process, for 
n = 10®. There, the uniform process took 128 rounds, while the size-biased process finished in 96. 

1.2. Empirical results. Our simulations show that the running time of the size-biased process 
is approximately 51og2n.. Moreover, they further demonstrate that the new size-biased process 
empirically performs substantially better than the uniform process even for fairly small values of 
n, i.e. the improvement appears not only asymptotically in the limit but already for ordinary input 
sizes. These results are summarized in Figure [H where the plot on the left clearly shows how the 
uniform process diverges from the linear (in logarithmic scale) trend corresponding to the runtime 
of the size-biased process. The rightmost plot identifies the crux of the matter: the uniform process 
rapidly produces a highly skewed cluster-size distribution, which slows it down considerably. 

1.3. Related work. There is extensive literature on stochastic coalescence processes whose various 
flavors fit the following scheme: The clusters act via a continuous-time process where the coalescence 
rate of two clusters with given masses x, y (which can be either discrete or continuous) is dictated 
up to re-scaling by a rate kernel K. A notable example of this is Kingman's coalescent [18], which 
corresponds to the kernel K{x, y) = 1 and has been intensively studied in mathematical population 
genetics; see e.g. j7] for more on Kingman's coalescent and its applications in genetics. Other rate 
kernels that have been thoroughly studied include the additive coalescent K{x, y) = x + y which 
corresponds to Aldous's continuum random tree [1], and the multiplicative coalescent K{x,y) = xy 
that corresponds to Erdos-Renyi random graphs [9] (see the books [HIlT]). For further information 
on these as well as other coalescence processes, whose applications range from physics to chemistry 
to biology, we refer the reader to the excellent survey of Aldous [2]- 

A major difference between the classical stochastic coalescence processes mentioned above and 
those studied in this work is the synchronous nature of the latter ones: Instead of individual merges 
whose occurrences are governed by independent exponentials, here the process is comprised of 
rounds where all clusters act simultaneously and the outcome of a round (multiple disjoint merges) 
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is a function of these combined actions. This framework introduces dehcate dependencies between 
the clusters, and rather than having the coalescence rate of two clusters be given by the rate kernel 
X as a function of their masses, here it is a function of the entire cluster distribution. For instance, 
suppose nearly all of the mass is in one cluster Ci (which thus attracts almost all merge requests); 
its coalescence rate with a given cluster Cj in the uniform coalescence process U clearly depends 
on the total number of clusters at that given moment, and similarly in the size-biased coalescence 
process S it depends on the sizes of all other clusters, viewed as competing with Cj over this merge. 
In face of these mentioned dependencies, the task of analyzing the evolution of the clusters along 
the high-dimensional stochastic processes U and S becomes highly nontrivial. 

In terms of applications and related work in Computer Science, the processes studied here have 
similar flavor to those which arose in the 1980's, most notably the Random Mate algorithm intro- 
duced by Reif, and used by Gazit [15] for parallel graph components and by Miller and Reif [20] for 
parallel tree contraction. However, as opposed to the setting of those algorithms, a key difference 
here is the fact that as the process evolves through time each cluster is oblivious to the distribution 
of its peers at any given round (including the total number of clusters for that matter). Therefore 
for instance it is impossible for a cluster to sample from the uniform distribution over the other 
clusters when issuing its merge request. 

For another related line of works in Computer Science, recall that the coalescence processes 
studied in this work organize n agents in a hierarchic tree, where each merged cluster reports to 
its acceptor cluster. This is closely related to the rich and intensively studied topic of Randomized 
Leader Elections (see e.g. [6l[l2l[22l[23l[27j ) , where a computer network comprised of n processors 
attempts to single out a leader (in charge of communication, etc.) by means of a distributed 
randomized process generating the hierarchic tree. Finally, studying the dynamics of randomly 
merging sets is also fundamental to understanding the average-case performance of disjoint-set data 
structures (see e.g. the works of Bollobas and Simon [5], Knuth and Schonhage [19j and Yao ^26j). 
These structures, which are of fundamental importance in computer science, store collections of 
disjoint sets, and support two operations: (i) taking the union of a pair of sets, and (ii) determining 
which set a particular element is in. See e.g. [H] for a survey of these data structures. The processes 
studied here precisely consider the evolution of a collection of disjoint sets under random merge 
operations, and it is plausible that the tools used here could contribute to advances in that area. 

1.4. Main techniques. As we mentioned above, the main obstacle in the coalescence processes 
studied here is that since requests go to other clusters with probability proportional to their size, 
the largest clusters can create a bottleneck, absorbing all requests yet each granting only one per 
round. An intuitive approach for analyzing the size-biased process S would be to track a statistic 
that would warn against this scenario, with the most obvious candidate being the size of the largest 
cluster. However, simulations indicate that this alone will be insufficient as the largest cluster does 
in fact grow out of proportion in typical runs of the process. Nevertheless, the distribution of large 
clusters turns out to be sparse. The key idea is then to track a smoother parameter involving the 
susceptibility, which is essentially the second moment of the cluster-size distribution. 

To simplify notation normalize the cluster-sizes to sum to 1 so that the initial distribution 
consists of n clusters of size ^ each. With this normalization, the susceptibility xt is defined as 
the sum of squares of cluster-sizes after the t-th round. (We note in passing that this parameter 
has played a central role in the study of the phase-transition in Percolation and Random Graphs, 
see e.g. |16tl25|.) The proof that the size-biased protocol is optimal hinges on a carefully chosen 
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Figure 2. Numerical estimations by Oded Schramm for the functions Gt{s) from his analytic 
approximation of the uniform coalescence process. The left plot features Gt{s) for t = {0, 2, . . . , 40} 
and s G [0, 1] and demonstrates how these increase with t. The right plot focuses on Gti'^) and 
suggests that Gt(|) — >■ 1 and that in turn the coalescence rate should be super- logarithmic. 



potential function = Xt^'t + C log Kt, where Kt denotes the number of clusters after the t-th round 
and C is an absolute constant chosen to turn into a super martingale. In Sections [3] and |4] we 
will control the evolution of and prove our upper bound on the running time of the size-biased 
process. 

The analysis of the uniform process U is delicate and relies on rigorizing and analyzing the novel 
framework of Schramm |24j for approximating the problem by an analytic one. We believe this 
technique is of independent interest and may find additional applications in the analysis of high- 
dimensional stochastic processes. Instead of seeking a single parameter to summarize the system 
behavior, one instead measures the system using the Laplace transform of the entire cluster-size 
distribution: 

Definition 1.1. For any integer t > let J-t he the a-algehra generated by the first t rounds of the 
process. Conditioned on Ft, define the functions Ft{s) and Gt{s) on the domain R as follows. Let 
K be the number of clusters and let wi, . . . he the normalized cluster-sizes after t rounds. Set 

K 

Ft{s) = y^^M-Wis), Gt{s) = -Ft{Ks). (1.1) 

i=l 

As we will further explain in Section [21 the Laplace transform Ft simultaneously captures all the 
moments of the cluster-size distribution, in a manner analogous to the moment generating function 
of a random variable. This form is particularly useful in our application as we will see in Section [5] 
that the specific evaluation Gt(^) governs the expected coalescence rate. Furthermore, it turns out 
that it is possible to estimate values of Ft (and Gt) recursively. Although the resulting recursion is 
nonstandard and highly complex, a somewhat intricate analysis eventually produces a lower bound 
for the uniform process. 

1.5. Organization. The rest of this paper is organized as follows. In Section [2] we describe 
Schramm's analytic approach for approximating the uniform process U. Sections [3] and [4] are 
devoted to the size-biased process S: In the former we prove that E[rc(5)] = O(logn) and in 
the latter we build on this proof together with additional ideas to show that Tc{S) = O(logn) 
w.h.p. The final section, Section (Sj builds upon Schramm's aforementioned framework to produce 
a super-logarithmic lower bound for Tc{U). 
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2. Schramm's analytic approximation framework for the uniform process 

In this section we describe Schramm's analytic approach as it was presented in |24J for analyzing 
the uniform coalescence process lA, as well as the numerical evidence that Schramm obtained based 
on this approach suggesting that T^ipi) is super-logarithmic. Throughout this section we write 
approximations loosely as they were sketched by Schramm and postpone any arguments on their 
validity (including concentration of random variables, etc.) to Section [5l where we will turn elements 
from this approach into a rigorous lower bound on TciJA). 

Let Tt denote the cj-algebra generated by the first t rounds of the coalescence process U. The 
starting point of Schramm's approach was to examine the following function conditioned on Tt- 

Pt{s) = ^exp(-t(;js) , 

i=l 

where Kt is the number of clusters after t rounds and wi, . . . , to^t denote the normalized cluster-sizes 
at that time (see Definition I l.ip . The benefit that one could gain from understanding the behavior 
of Ft{s) is obvious as Ft{0) recovers the number of clusters at time t. 

More interesting is the following observation of Schramm regarding the role that Ft{Kt/2) plays 
in the evolution of the clusters. Conditioned on J^t, the probability that the cluster Cj receives a 
merge request from another cluster Cj is ^Wi (the factor | accounts for the choice of Cj to issue 
rather than accept requests). Thus, the probability that Cj will receive any incoming request in 
round t + 1 and independently decide to be an acceptor is 

i [1 - (1 - w,/2r-'] « i [1 - exp{-w,Kt/2)] . 

On this event, Cj will account for one merge at time t + 1, and summing this over all clusters yields 

1 1 
E[Kt+i I -Fi] ~ - - 5^ [1 - exp{-w^Kt/2)] = ^[Kt + Ft{Kt/2)] , 

i=l 

or equivalently, re-scaling Ft{s) into Gt{s) = (1/ Kt)Ft{Kts) as in Eq. (jl.ip . 

E[Ki+i/Kt| (2.1) 

In order to have Tc{U) x logn the number of clusters would need to typically drop by at least a 
constant factor at each round. This would require the ratio in (j2.ip to be bounded away from 1, 
or equivalently, Gt{\) should be bounded away from 1. 

Unfortunately, the evolution of the sequence Gt{\) = {1/ Kt)Ft{Kt/2) appears to be quite complex 
and there does not seem to be a simple way to determine its limiting behavior. Nevertheless, 
Schramm was able to write down an approximate recursion for the expected value of -Ft+i in terms 
of multiple evaluations of Ft by observing the following: On the above event that Ci chooses to 
accept the merge request of some other cluster Cj , by definition of the process U the identity of the 
cluster Cj is uniformly distributed over all — 1 clusters other than Cj. Hence, 

E [Ft+i{s) - Ftis) I -^t] « ^ (l - e-'"''^*/^^ ^ {e-^'"'+'^'^' - e""''' - e-""^^) . 

Ignoring the fact that the last sum in the approximation skips the diagonal terms j = i, one arrives 
at a summation over all 1 < i, j < of exponents similar to those in the definition of Ft with an 
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argument of either s, or s + which after rearranging gives 

Ftis)+Fti^it/2)-Ft{s + Kt/2) . 

To turn the above into an expression for Gt+i{s) one needs to evaluate Ft-\-i{Kt+is) rather than 
Ft+i{Kts), to which end the approximation Kt+i ~ ^[1 + Gt{^)]Kt can be used based on (j2.ip . 
Additionally, for the starting point of the recursion, note that the initial configuration of = 1 /kq 
for all 1 < z < has Go{s) = exp(— s). Altogether, Schramm obtained the following deterministic 
analytic recurrence, whose behavior should (approximately) dictate the coalescence rate: 

go{s) = exp(-s) , 

9t+i{s) = ^ [gt{as)'^ - gt{as + l)gt{as) + gt{as + i) + gt{^)gt{as)] where a = l[l + gtil)] ■ 

In light of this, aside from the task of assessing how good of an approximation the above defined 
functions gt provide for the random variables Gt along the uniform coalescence process U, the other 
key question is whether the sequence gt{^) converges to 1 as t — )• oo, and if so, at what rate. 

For the latter, as the complicated definition of gt+i attests, analyzing the recursion of gt seems 
highly nontrivial. Moreover, a naive evaluation of involves exponentially many terms, making 
numerical simulations already challenging. The computer-assisted numerical estimates performed 
by Schramm for the above recursion, shown in Figure [21 seemed to suggest that indeed gt{^) 1 
(albeit very slowly), which should lead to a super-logarithmic coalescence time for U. However, no 
rigorous results were known for the limit of gt{^) or its stochastic counterpart Gt{^). 

As we show in Section [5l in order to turn Schramm's argument into a rigorous lower bound on 
Tc{U), we move our attention away from the sought value of Gt{^) and focus instead on Gt{l). By 
manipulating Schramm's recursion for Gt and combining it with additional analytic arguments and 
appropriate concentration inequalities, we show that as long as Kt is large enough and Gt(|) < 1 — 6 
for some fixed 6 > 0, then typically Gt+i{l) > Gt{l) + e for some e{5) > 0. Since by definition 
< Gt{l) < 1 this can be used to show that ultimately Gt{^) — )• 1 w.h.p., and a careful quantitative 
version of this argument produces the rigorous lower bound on Tc{U) stated in Theorem [TJ 

3. Expected running time of the size-biased process 

The goal of this section is to prove that the expected time for the size-biased process to complete 
has logarithmic order, as stated in Proposition 13.11 Following a few simple observations on the 
process we will prove this proposition using two key lemmas, Lemmas 13.41 and 13. 5|, whose proofs 
will appear in ^3.21 and 53]3] respectively. In Section U] we extend the proof of this proposition using 
some additional ideas to establish that the coalescence time is bounded by O(logn) w.h.p. 

Proposition 3.1. Let Tc = Tc{S) denote the coalescence time of the size-biased process S. Then 
there exists an absolute constant C > such that Ei [tc] < Clogn, where Ei[-] denotes expectation 
w.r.t. an initial cluster distribution comprised of n singletons. 

Throughout Sections [3] and U] we refer only to the size-biased process and use the following 
notation. Define the filtration Ft to be the fi-algebra generated by the process up to and including 
the t-th round. Let Kt denote the number of clusters after the conclusion of round t, noting that 
with these definitions we are interested in bounding the expected value of the stopping time 

Tc = min{t : Kt = 1} . (3-1) 



E[Ft+i(s) I Ft] « -Ft{s + Kt/2) + —Ft{s) 
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As mentioned in the introduction, we normalize the cluster-sizes so that they sum to 1. Finally, 
the susceptibility xt denotes the sum of squares of the cluster-sizes at the end of round t. 

Observe that by Cauchy-Schwarz, if wi, . . . ^w^t are the cluster-sizes at the end of round t then 
we always have 

Xtt^t> ( ^^^i) =1, (3.2) 

^ i=i ^ 

with equality iff all clusters have the same size. Indeed, the susceptibility xt measures the variance 
of the cluster-size distribution. When xt is smaller (closer to k^^)' distribution is more uniform. 
We further claim that 

Xt+i < Ixt for all t . (3.3) 
To see this, note that if a cluster of size a merges with a cluster of size b the susceptibility increases 
by exactly (a + b)'^ — {a? + 6^) = 2ab < a? + b'^. Since each round only involves merges between 
disjoint pairs of clusters, this immediately implies that the total additive increase in susceptibility 
is bounded by the current sum of squares of the cluster sizes, i.e., the current susceptibility xt- 

Before commencing with the proof of Proposition 13. H we present a trivial linear bound for the 
expected running time of the coalescence process, which will later serve as the final step in our 
proof. Here and in what follows, P^; and denote probability and expectation given the initial 
cluster distribution w. While the estimate featured here appears to be quite crude when w is 
uniform, recall that in general can in fact be linear in the initial number of clusters w.h.p., e.g. 
when w is comprised of one cluster of mass 1 — l/y^ and ^/n other clusters of mass 1/n each. 

Lemma 3.2. Starting from k clusters with an arbitrary cluster distribution w = {wi, . . . ,Wk) we 
have Ett,[Tc] < 8/t. Furthermore, fw{Tc > 16k) < e~'^^^. 

Proof. Consider an arbitrary round in which at least 2 clusters still remain. We claim that the 
probability that there is at least one merge in this round is at least |. Indeed, let Ci be a cluster 
of minimal size: The probability that it decides to send a request is ^, and since there are at least 
two clusters and Ci is the smallest one, the probability that this request goes to some Cj with j 7^ 1 
is at least | . Finally, the probability that Cj is accepting requests is again ^ . Conditioned on these 
events, Cj will definitely accept some request (possibly not the one from Ci as another cluster of 
the same size as Ci may have sent it a request) leading to at least one merge, as claimed. 

The process terminates when the total cumulative number of merges reaches k — 1. Therefore, 
the time of completion is stochastically dominated by the sum of k — 1 geometric random variables 
with success probability i, and in particular E^[rc] < 8{k — 1). 

By the same reasoning, the total number of merges that occurred in the first t rounds clearly 
stochastically dominates a binomial variable Bm{t, |) as long as t < Tc- Therefore, 

¥y,{Tc > 16k) < P (Bin(16K, |) < k - l) < e""/"^ , 
where the last inequality used the well-known Chernoff bounds (see e.g. |17i Theorem 2.1]). ■ 

3.1. Proof of Proposition 13.11 via two key lemmas. We next present the two main lemmas 
on which the proof of the proposition hinges. The key idea is to design a potential function 
comprised of two parts ^'i,$2 while identifying a certain event At such that the following holds: 
E[$i(t + 1) - $i(t) I , ^i] < ci < and E[^>2(t + 1) - ^2it) \ J^t] < C2 where ci,C2 are absolute 
constants, and a similar statement holds conditioned on A'j: when reversing the roles of $1 and $2- 
At this point we will establish that an appropriate linear combination of ^i, $2 is a supermartingale. 
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and the required bound on Tc will follow from Optional Stopping. Note that throughout the proof 
we make no attempt to optimize the absolute constants involved. The event At of interest is defined 
as follows: 

Definition 3.3. Let At be the event that the following two properties hold after the t-th round: 
(i) At least Kt/2 clusters have size at most 1/(600k(). 
(a) The cluster-size distribution satisfies Ylii'^i^{wi<ii/Kt} < 10~^. 

The intuition behind this definition is that Property ^ boosts the number of tiny clusters, 
thereby severely retarding the growth of the largest clusters, which will tend to see incoming 
requests from these tiny clusters. Property ([n]) ensures that most of the mass of the cluster-size 
distribution is on relatively large clusters, of size at least 41 times the average. 

Examining the event At will aid in tracking the variable Xt f^t > the normalized susceptibility 
(recall from (|3.2p that this quantity is always at least 1 and it equals 1 whenever all clusters are 
of the same size). The next lemma, whose proof appears in ^3.21 estimates the expected change in 
this quantity and most notably shows that it is at most —-^ if we condition on At. 

Lemma 3.4. Let ^i{t) = Xt <ind suppose that at the end of the t-th round one has > 2. Then 

E[$i(t + 1) -^>i(t) I Ji] < 5 (3.4) 

and furthermore 

E[^i{t + l)-^i{t)\Tt,At,Xt<3-W-''] <-2So- (3-5) 

Fortunately, when At does not hold the behavior in the next round can still be advantageous in 
the sense that in this case the number of clusters tends to fall by at least a constant fraction. This 
is established by the following lemma, whose proof is postponed to ^3.31 

Lemma 3.5. Let $2(i) = logK^ and suppose that after the t-th round one has Kt > 2. Then 

E [<i>2{t + 1) - $2(0 I J^t , ^t] < -2 • 10"^ . (3.6) 

We are now in a position to derive Proposition 13.11 from the above two lemmas. 

Proof of Proposition 13. 1[ Define the stopping time r to be 

r = min : ^ 3 • 10^^} . 

Observe that the susceptibility is initially 1/n, its value is 1 once the process arrives at a single 
cluster (i.e. at time Tc) and until that point it is nondecreasing, hence Er < Etc < cxd by Lemma [3. 2 [ 
Further define the random variable 

Zt = XtKt + 3 ■ log 1^1 + ^. 

We claim that (ZtAr) is a supermartingale. Indeed, consider E[Zf+i \ Tt ,t > t] and note that the 
fact that T > t implies in particular that Kt ^ since in that case xt < 3 ■ 10^^ < 1. 

• If At holds then by (|3.5p the conditional expected change in xt^t is below — ggo' while log Kt 
can only decrease (as Kt is non-increasing), hence E[Zt_|_i \ J^t ,At ,t > t] < Zt- 

• If At does not hold then by ()3.4p the conditional expected change in Xt>^t is at most -|-5 
whereas the conditional expected change in logKt is below —2 • 10^^ due to (j3.6p . By the 
scaling in the definition of Zt these add up to give E[Z(_|_i | 7"^ , , r > t] < — 
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Altogether, {Zt/\r) is indeed a supermartingale. As its increments are bounded and the stopping 
time r is integrable we can apply the Optional Stopping Theorem (see e.g. [3 Chapter 5]) and get 

EZr <Zo = xo^o + 3-10^ log Ko = 0(log n) . (3.7) 

At the same time, by definition of r we have ^ 3 • 10^'^ and so 

Zr = XrKr + 3 • 10^ log + ^ > 3 • 10"^ (k, + t/8) . (3.8) 

Taking expectation in ()3.8p and combining it with ()3.7p we find that 

E[t + 8k^] < O(logn). 

Finally, conditioned on the cluster distribution at time r we know by Lemma [3.2l that the expected 
number of additional rounds it takes the process to conclude is at most 8k,-, thus E[tc] < E[t + 8k^]. 
We can now conclude that E[tc] = O(logn), as required. ■ 

3.2. Proof of Lemma 13. 4t Estimating the normalized susceptibility when At holds. 

The first step in controlling the product xt K-t is to quantify the coalescence rate in terms of the 
susceptibility, as achieved by the following claim. 

Claim 3.6. Suppose that at the end of the t-th round one has Kt > 2. Then 

E[Kt+i I Ft] <Kt- (46xt)"' (3.9) 

and furthermore 

P [Kt+i <Kt- (lOOxt)-' I -Ft , < 3 • 10-^) > 1 - e-i°o . 

Proof. To simplify the notation \e,i k = Kt, X = Xt and ^' = i^t+i throughout the proof of the claim. 
Further let the clusters Cj be indexed in increasing order of their sizes and let Wi = \Ci\. 

Recall that the number of merges in round t + 1 is precisely the number of clusters which decide 
to accept requests and then receive at least one incoming request from a cluster of size no larger 
than itself. Consider the probability of the latter event for a cluster Ci with i > [k/2\. Since the 
clusters are ordered by size there are at least [k/2\ clusters of size at most Wi and each will send 
a request to Ci independently with probability Wi/2 (the factor of 2 is due to the probability of 
issuing rather than receiving requests this round) . The probability that none of these clusters do so 
is thus at most (1 — < e~"'»''/^ (where we used the fact that [k/2J > k/3 for any k > 2), 

and altogether the probability that Ci accepts a merge request from one of these clusters is at least 
^(l — g-^ii^/^Y Summing over these clusters we conclude that 

E[k -k' \Ft]> Yl 2 ~ e""'''^/^) ^ 4 ~ e""''''/^) , 

i>[K/2] i=l 

where the last inequality follows from the fact that the summand is increasing in Wi and hence 
the sum over the [k/2] largest clusters should be at least as large as the sum over the [k/2\ 
smallest ones. Next, observe that by concavity, for all < Wi < 6x the final summand is at least 
Wi ■ ^[l — e~^'^)/{6x) which in turn is at least Wi ■ \ — e~^)/{6x) by Eq. ()3.2p . As this last 
expression always exceeds Wi/{38x) we get 

nK-K'\Ft]>^ ^i- (3-10) 
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We now aim to show that much of the overall mass is spread on clusters of size at most 6x- To this 
end recall that by definition X = while — 1; hence we can write x = where Y is 

the random variable that accepts the value Wi with probability Wi for i = 1, . . . , k. This gives that 

w^ = F{Y < 6EY) > ^ , 

Wi<ex 

(with the final bound is due to Markov's inequality) and revisiting ()3.10p we obtain that 

E[k-k' \Tt]> —■-> —, 
^ ' *^ 38x 6 46x' 

establishing inequality ()3.9p . 

To complete the proof of the claim it suffices to show that the random variable X = k — k' is 
suitably concentrated, to which end we use Talagrand's inequality (see, e.g., |21i Chapter 10]). In 
its following version we say that a function / : Yl - — ?■ M is C-Lipschitz if changing its argument uj 
in any single coordinate changes /(w) by at most C, and that / is r-certifiable if for every s and uj 
with f{uj) > s there exists a subset / of at most rs coordinates such that every oj' that agrees with 
u on the coordinates indexed by / also has f{u}') > s. In the context of a product space O = ilj 
these definitions carry to the random variable that / corresponds to via the product measure. 

Theorem 3.7 (Talagrand's inequality). If X is a C-Lipschitz and r-certifiable random variable on 
n = nr=i ihen P(|X - EX| > t + mC^/rEX) < 4exp {-t^ /{SC^rEX)) for any < t < EX. 

Observe that round t + 1, conditioned on J't, is clearly a product space as the actions of the 
individual clusters are independent: Formally, each cluster chooses either to accept requests or to 
send a request to a random cluster. Changing the action of a single cluster can only affect X, the 
number of merges in round t + 1, by at most one merge and so X is 1-Lipschitz. Also, if X > s 
then one can identify s clusters which accepted merge requests from smaller clusters. By fixing the 
decisions of the 2s clusters comprising these merges (the acceptors together with their corresponding 
requesters) we must have X > s regardless of the other clusters' actions, as the s acceptors will 
accept (possibly different) merge-requests no matter what. Thus, X is also 2-certifiable. 

Let = EX and assume now that x < 3 • 10~^. By the first part of the proof (Eq. ()3.9p ) it then 
follows that fi > (46x)^^ > 70000, in which case Talagrand's inequality gives 

^ -L fin, <■ A^^r^ ( - (n/(\\'^ l(^(\nW - A^=-/^/576 ^ ^-100 



X - /x| > ^ + 60V2^j < 4exp ( - (^/6)7(16/i)) = ^e'^"'^^^ < e 
Also, note that our above bound [i > 70000 > 2 • 180^ implies that 

eOy^ < m/3, 

so in fact the probability of X falling below ^ ~ (g + 3) is at most e"^*"^. As ^ > (46x)~^ we 
conclude that k — k! = X > (100%)^^ with probability at least 1 — e"^*^*^, as required. ■ 

As the above claim demonstrated the effect of the susceptibility on the coalescence rate, we move 
to study the evolution of the susceptibility. The critical advantage of the size-biased process is that 
large clusters grow more slowly than small clusters. The intuition behind this is that larger clusters 
tend to receive more requests, and since clusters choose to accept their smallest incoming request, 
these clusters typically have more choices to minimize over. It turns out that this effect is enough 
to produce a useful quantitative bound on the growth of the susceptibility. 
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Claim 3.8. Suppose that after the t-th round > 2. Then 

nxt+i \j't]<xt + -. (3.11) 

Proof. Set K = Kt and X = Xt- Let the clusters Cj be indexed in increasing order of their sizes and 
let Wi = \Ci\. For each cluster Ci let the random variable Xi be the size of the smallest cluster 
that it receives a merge request from, as long as that cluster is no larger than itself, and not itself; 
otherwise (the case where Cj receives no merge requests from another cluster of size less than or 
equal to its own) set Xi = 0. Under these definitions we have 

nxt+l\J't] = x+Y.'^^mi], (3.12) 

i=l 

since each d is an acceptor with probability ^ and if it indeed accepts a request from a cluster of 
size Xi then the susceptibility will increase by exactly {wi + Xj)^ — (wf + Xf) = 2wiXi. 

Next, note that since we ordered the clusters by increasing order of size, each of the first [k/2J 
clusters has size at most 2/k (otherwise the last [k/2] clusters would combine to a total mass larger 
than 1). We will use this fact to bound K[Xi \ Tt] by considering two situations: 

1. If Cj receives an incoming request from at least one of the first [k/2\ clusters (including itself) 
then Xi < 2/k by the above argument. The probability of this is precisely 1 — (l — ^) ^''^^^ 
as each of the first [k/2\ clusters Cj independently sends a request to Ci with probability Wi/2 
(with the factor of 2 due to the decision of Cj whether or not to issue requests) . 

2. If Cj gets no requests from the first [k/2J clusters then use the trivial bound Xi < Wi. 
Combining the two cases we deduce that 

EX.<[l-{l-^) (3.13) 

We claim that EXj is in fact always at most 5/k. To see this, first note that if < 2/k then this 
immediately holds, e.g. since Xi < Wi. Consider therefore the case where Wi > 2/k. Since (|3.13p is 
a weighted average 2/k and Wi > 2/k, it increases whenever the weight on Wi is increased. As 



2 

we have that in this case 



2 J - 



lE^i < (l - e-""'^^^] - + e-'"'^/^Wi <-(2 + WiKe^'"'''/^ 



K K 

One can easily verify that the function f{x) = xe~^^^ satisfies /(x) < 3 for all x, hence we conclude 
that EXi < 5/k in all cases, as claimed. Plugging this into equation (|3.12p we obtain that 



Hxt+i I -^t] < X + - ^t^i = X + - , 

K K 



as required. 



While the last claim allows us to limit the growth of the susceptibility, this bound is unfortunately 
too weak in general. For instance, when used in tandem with Claim 13.61 it results in the the 
susceptibility growing out of control, while the number of clusters decreases slower and slower. 
Crucially however, conditioned on the event At (as given in Definition 13. 3p we can refine these 
bounds to show that the growth of Xt+i slows down dramatically, as the following claim establishes. 
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Claim 3.9. Suppose that at the end of the t-th round Kt > 2. Then 

K[xt+i I Tt,At]<xt + {20lKt)'\ 
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(3.14) 



Proof. Let k = Kt and X = Xt; c^^id define the random variables Xi as in the proof of Claim [3^ By 
the same reasoning used to deduce inequality p.lSp . only now using Property (ji]) of At according 
to which each of the smallest [k/2] clusters has size at most 1/(600^^), we have 



EX, < 1 



1 _ 

2 



1 



600k 



Wi 



+ 1-^ 



Wi 



(3.15) 



Recall that equation (j3.12p established that E[xt+i | J^t] = X + Z]r=i Wi^Xi. This time we will need 
to bound this sum more delicately by splitting it into two parts based on whether or not wt < 41/ k. 
In the case Wi < 41 / k we can use the trivial bound Xi < Wi to arrive at 

X]^il{"'.<41/K}^^^i < ^'^i^{w,<Al/K}— < 4- 10"^ • — , 

Kj tv 

I t 

where the last inequality is by Property ^ of At . For the second part of the summation we use the 
same weighted mean argument from the proof of Claim [3^ to deduce that when Wi > (600«;)~^ the 
r.h.s. of (j3.15p increases with the weight on Wi, which in turn is at most (l — ^'^^'^'^ < exp{—WiK/4:). 
In particular, in case Wi > 41 /k we have 



EXi< (l- e-'"^^^'^) ^— + e-^'^'^/^u'i <-(— + ti-^K e"^'''/^^ < - ( — + 41( 
' - V ; 600k ' - k V600 ' J - K V600 



-41/4 



(here we used the fact that the function xe is decreasing for x > 41). Combining our bounds, 
t ».EX, < 1 (4 . 10- .41 + 5: .,!,„>„/., + 41.-"/^) ) < . 

1=1 ^ i ^ ' ' 

since X^j'Wj = 1. Together with equation (j3.12p this completes the proof. ■ 

Combining the bound on Kt^\ in Claim 13.61 with the bounds on Xt+i from Claims 13.81 and 
will now result in the statement of Lemma [3]: 



Proof of Lemma 13.41 For convenience let k = Kt and x = Xt, as well as k' = Kt+i and x' = Xt+i- 
The first statement of the lemma is an immediate consequence of Claim 13.81 since k' < k and so 

nx'i^' I j't] < i^nx' I -^t] < ^(x + ^) = x'^ + 5 . 



For the second statement, since we can break down x'^' into 



/ / I , 

X = X [1^ 



1 



1 



lOOx 



{k'<k 100^} 



100x7 V loox. 

noticing that the last expression in the r.h.s. is at most 0, and recalling that < x ^ x' ^ ^x (due 
to Eq. ()3.3p ) and 1 < < k, we now obtain that E [x'^' | J^t , , x < 3 • 10^''] is at most 



E 



X 



lOOx 



lOOx 



E 



X 



, At , X < 3 • 10-^ 
Tt,At,x<3- 10-^ 



+ E 



2x 



lOOx -'" "'Ox 



+ 



50 



k' > K 



lOOx 



Ji,At,x < 3 - 10 
J-t,At,X<3-10-^ 
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Applying Claims [3^ and \3M now gives 

I t'A J - \^ lOOxy V 201^7 50 

1 1 1 -100 1 

<^'^- 100 + 20T + 50^ <^'^- 200' 
completing the proof. ■ 

3.3. Proof of Lemma 13. 5t Estimating the number of components when At fails. We wish 
to show that whenever either one of the two properties specified in At does not hold, the expected 
number of clusters drops by a constant factor. 

Suppose that Property ^ of At fails: In this case a constant fraction of the clusters have size 
which is at least a constant fraction of the average size 1/nf We will show that each such cluster 
receives an incoming request (from another cluster of no larger size) in the next round with a 
probability that is uniformly bounded from below. Consequently, we will be able to conclude that 
the number of clusters shrinks by at least a constant factor in expectation. 

Claim 3.10. Suppose that at the end of the t-th round Kt >2 and Property ^ of At does not hold, 
i.e. more than Ktjl clusters have size greater than (600^^)^"'^. Then 

n^t+i \Tt]<{l-5- 10-5) Hit . (3.16) 

Proof. Let k = nt and k' = nt+i and as usual, order the clusters by increasing order of size. 
Consider an arbitrary cluster Cj which is one of the last [k/2] clusters, and let Wi denote its size. 
If Ci opts to accept requests in this round (with probability ^) and any of the first [k/2J clusters 
sends it a request, it will contribute a merge in this round. This occurs with probability 

5 (l - (l - f ) '""') > 5 (l - >\i.l- > 10- , 

where we used our assumption that Wi > (600k)-^. Thus, the probability that Ci contributes to a 
merge is at least 10"^. We conclude that the expected number of merges in this round is at least 
10-^[k/2], from which the desired result follows. ■ 

Now suppose that Property ([ii|) of At fails: Here at least a constant proportion of the mass of 
the cluster-size distribution falls on clusters with size at most a constant multiple of the average 
size. Such clusters behave nicely as in this window the relation between the cluster-size and the 
typical number of incoming requests can be bounded by a linear function. Again, this will result 
in a constant proportion of clusters merging in the next round in expectation. 

Claim 3.11. Suppose that at the end of the t-th round Kt > 2 and Property ([ii|) of At does not 
hold, i.e. ^iWi'^{wi<4:i/Kt} — ^' 10"^ where wi denotes the size of Ci. Then 

E[Kt+i I -Ft] < (1 - 2 • 10-^) . (3.17) 

Proof. Let k = Kt and k' = Kf+i. Order the clusters by size and let r be the number of clusters 
which are smaller than 41/k. Since clearly at most k/41 clusters can have size at least 41/k we 
have r > ["Iy^I- Notice that since k>2 this implies that in particular [r/2\ > k/3. By the same 
arguments as before, each cluster Ci with \r/2\ < i < r will accept a merge request from a smaller 
cluster with probability at least 
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Since we are concentrating our attention on the clusters of size Wi < 4:1/ k, concavity implies that 
the last expression is actually at least 

1 _ g-41/6\ ^ w^i^ 



2 V ; 41/k 100 ■ 

We conclude that the expected number of merges in this round is at least 

— > ywi> 4 - 10"^ = 2 • 10"^ K , 

100 - 100 2 ^ 100 2 

j=[r/2j+l i=l 

where we used the fact that the Wj's are sorted in increasing order to relate the sum over the cluster 
indices [r/2\ + 1, . . . , r to the one over the first r clusters. This gives the desired result. ■ 

Proof of Lemma 13.51 The proof readily follows from the combination of Claims 13.101 and 13.111 
Indeed, these claims establish that whenever the event At fails we have 

E[Kt+i \Tt,A^] < (1-2-10-^) Kt. 

Therefore, by the concavity of the logarithm, Jensen's inequality implies that 

E[log Kt+i I , A't] < logE[Ki+i I , < log Kt + log(l - 2 • 10-^) < log - 2 • 10'^ , 

as required. ■ 

4. Optimal upper bound for size-biased process 

We now prove the upper bound in Theorem [T] by building upon the ideas of the previous section. 
Recall that in the proof of Proposition 13.11 we defined the sequence 

Zt = XtKt + MlogKt + ^ whereM = 3•10^ 

established that it was a supermartingale and derived the required result from Optional Stopping. 
That approach was only enough to produce a bound on IE[tc], the expected completion time. For 
the stronger result on the typical value of Tc we will analyze (Zt) more delicately. Namely, we 
estimate its increments in to qualify an application of an appropriate Bernstein-Kolmogorov 
large-deviation inequality for supermartingales due to Freedman [13j. 

An important element in our proof is the modification of the above given variable Zt into an 
overestimate Yt which allows far better control over the increments in This is defined as follows: 

Yo = Zo = xo/^o + M log Ko = 1 + M log n , (4.1) 

^ ^ f (h^+i Alog2/3n) +Mlog^ + 2^ ifr, >t, 

where 

St+i = Xt+l(^Kt+l V (^Kt- —^^ - XtK-t ■ 

The purpose of the (kj — ^) term is to limit the potential decrease from negative H. In this section, 
we will need two-sided estimates (in addition to one-sided bounds such as those used in the previous 
section) due to the fact that we must control the increments. 

It is clear that It+i —Yt'> Zt+i — Zt as long as t < Tc and H^+i < log^^^ n. Therefore, setting 

f = min{t : '^t+i > log^^^ n} , 
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it follows that 

Yt > Zt for all t < Tc A r. (4.2) 

In what follows we will establish a large deviation estimate for (Yt), then use this overestimate for 
Zt to show that w.h.p. Tc = O(logn). We thus focus our attention on the sequence (Yt). 

Lemma 4.1. The sequence (Yt) is a supermartingale. 

Proof. Since by definition Yj = ItArc it suffices to consider the times t < Tc- As we clearly have 
{nt+i V (ftt — ^)) < Kt and Claim [3^ established that ^[xt+i | Ft] < + ^ we can deduce that 

E[Ht+i \Ft]<b. (4.3) 

Combined with Lemma 13.51 as in the proof of Proposition 13. H it then follows that 

nyt+i\j't,Ai]<Q. 

We turn to consider E[yt+i | Tt ,^t]- Since Kt+i < K,t holds for all t, it suffices to show that 

m.,r\Ft,At]<-^^. 



Indeed, as in the proof of Lemma 13.41 we write 

1 

+ Xt+i 



^t+i < Xt+l Kt 



WOxt 

1 



Kt+1 V ( Kt 

1 



1 

Xt 



Kt + 



WOxt 



WOxt 



-1 



which as before gives rise to the following: 

E[Et+i I Tt, At] < - 



1 1 1 -100 

\ \ e < 

100 201 50 



1 

200 



and we conclude that (Yt) is indeed a supermartingale, as required. 



Lemma 4.2. The increments of the supermartingale (Yt) are uniformly bounded in C? . Namely, 
for every t we have E[{Yt+i - Ytf \ Tt] < SM^ where M = 3 • lO'^. 



Proof. First observe that 



{Yt+i-Yt)' <3iEt+if + 3{Mlo: 



Kt+l 
Kt 



(4.4) 



Since ^Kt < Kt-^-i < Kt, we have — Mlog2 < Mlog^^|^ < 0, hence the last two expressions above 
sum to at most (with room to spare) and it remains to bound E[(H(+i)^ | Tt] = 0(1) for a 

suitably small implicit constant. 

Observe that when H^+i > we must have < xt+iKt — XtKt since (k(+i V (^f — ^)) < Kt. 

Conversely, if H^+i < then necessarily iH^+ij < xtKt — Xt+i{Kt — ^) < 1, with the last inequality 
due to the fact that Kt > 1/xt and Xt+i > Xt- Combining the cases we deduce that in particular 

< Kt{xt+i - Xt) + 1 • 
By Claim [32] we have E[xt+i — Xt \ ^t] < 5/kj, hence we get 



E[(St+i)2 I Ft] < kI E[ixt+i - Xt? I -Ft] + 1 + 2Kt{h/Kt) < kI niXt+i - Xt? \ H + H • (4.5) 
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It remains to show that K[(xt+i — Xt)'^ I ^t] = 0{l/Kf). To do so, let wi, . . . ,WKt be the cluster-sizes 
after the t-th round and recall that by Eq. (j3.12p and the arguments following it we have 



i=l 



where each Xi is a non-negative random variable satisfying EXj < S/k^ (marking the size of another 
cluster of no larger size that issued a request to Ci or if there was no such cluster) and each Ij is 
a Bernoulli(^) variable independent of Xi (indicating whether or not Ci chose to accept requests). 
Since 'Y^Wi = 1, it follows from convexity that 



i=l ^ i=l 



hence, taking expectation while recalling that and Xi are independent, 

Kt Kf 

nixt+i - Xt? I H <AY,wi{^x^)m) = 2Y,wi¥.x^ , 

i=l 



and it remains to bound "^Xf . Following the same argument that led to (j3.13p now gives 



2//-, I, ^AL^'/^JX , / u;ALKt/2j 2 



As before, we now deduce that either Wi < 2/Kt, in which case clearly KXf < A/k}, or we have 

Since rE^exp(— x/6) < 20 for all x > it then follows that EX? < 24/k^ (with room to spare). 
Either way we deduce that 

niXt+i - Xt? I J't] < 2 J] (u;, • 24/k?) = 48/^? , 

i 

and so, going back to (|4.5p . 

E[(Ht+i)2| Ji]<48 + ll<60. (4.6) 
Using this bound in (14. 4p we can conclude the proof as we have 

E[(yt+i - Ytf I Ft] < 3E[(St+i)2 I F] + < 2M^ . U 

By now we have established that (Yt) is a supermartingale which satisfies It+i — Yt < L for a 
value of L = log^/^ "-+205 ^hat in addition E[{Yt+i-Yt? \ F] < 2M'^. We are now in a position 
to apply the following inequality due to Freedman [IS] ; we note the this result was originally stated 
for martingales yet its proof, essentially unmodified, extends also to supermartingales. 

Theorem 4.3 ([13] Theorem 1.6]). Let (Si) be a supermartingale with respect to a filter {J-i). 
Suppose Si — Si-i < L for all i, and write Vt = X]i=i '^[{Si — Si^i? \ F^i]. Then for any s,v > 0, 

F {{St > So + s , Vt<v} for some t) < exp {-^s^/{v + Ls)) . 
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By the above theorem and a standard apphcation of Optional Stopping, for any s > 0, integer t 
and stopping time r we have F{YtAr >Yo + s)< exp {-Is'^/{2MH + Ls)). In particular, letting 

to = 500M log n 

and plugging s = log'^^^ n and r = f in the last inequality we deduce that 

nYtoAf >Yo + log3/4 n) < exp (-(i - o(l)) logVi2 = ^(l) . 

Hence, recalling the value of Yq from (j4.ip we have w.h.p. 

Yt,,Af < 1 + M log n + log^/^ n < 2M log n , (4.7) 

where the last inequality holds for sufficiently large n. 

In order to compare to and f , recall from Eq. (j4.3p that E[Ht+i | J^t] < 5 whereas we established 
in Eq. (gj]) that E[(Ht+i)2 | j;] < 60. By Chebyshev's inequality, 

¥{Et+i > log2/3n I Tt) = 0{E [{Et+if \ Tt] log-"/^ n) = 0{log-'/'n) . 

In particular, a union bound implies that 

F(r <to) = 0(log"^/^n). 

Revisiting (j4.7p this immediately implies that w.h.p. 

Fto < 2Mlogn, 

and since lipA-fArc > ^toAfArc (due to ()4.2p ) we further have that w.h.p. 

Y > 7 > 

to Arc ^ ^toATc ^ ' 

Therefore, we must have Tc < to w.h.p., otherwise the last two inequalities would contradict our 
choice of to = 500Mlogn. This concludes the proof. ■ 

5. SUPER-LOGARITHMIC LOWER BOUND FOR THE UNIFORM PROCESS 

In this section we use the analytic approximation framework introduced by Schramm to prove the 
super-logarithmic lower bound stated in Theorem[T]for the coalescence time of the uniform process. 
Recall that a key element in this framework is the normalized Laplace transform of the cluster-size 
distribution, namely Gt{s) = {1/ Kt)Ft{Kts) where Ft{s) = Yli=i^~^'^ (^se Definition II. ip . The 
following proposition, whose proof entails most of the technical difficulties in our analysis of the 
uniform process, demonstrates the effect of Gt{^) and Gt{l) on the coalescence rate. 

Proposition 5.1. Let ej = 1 — Gt(^) and Q = Gt{l). There exists an absolute constant C > 
such that, conditioned on Ft, with probability at least 1 — Gk^^^^ we have 

\fit+i- il-et/2)Kt\<K'J\ (5.1) 

Ct+i>Ct + ef^''-8K;'/'. (5.2) 

We postpone the proof of this proposition to ^5.41 in favor of showing how the relations that it 
establishes between Kt, Gt{l), Gt{^) can be used to derive the desired lower bound on Tc- We claim 
that as long as Kt,Gt{^),Gt{l) satisfy Eq. and t = 0(logn • then Kt > n^/^; 

this deterministic statement is given by the following lemma: 
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Lemma 5.2. SetT = ^ log Ti ' log log log n ^ sufficicTitly large n uTid let kq, • • • ; i^t ^ scQucTice 
of integers in {1, . . . , n} with kq = n. Further let £t and Q for t = 0, . . . ,T be two sequences of 
reals in [0, 1] and suppose that for all t < T the three sequences satisfy inequalities (j5.ip and ()5.2p . 
Then Kt > n^/^ for allt<T. 

Observe that the desired lower bound on the coalescence time of the uniform process U is an 
immediate corollary of Proposition 15.11 and Lemma 15.21 Indeed, condition on the first t rounds 
where < t < T = ^ log n • log k)g log n '^^^ assume Kt > n'^^'^ . Proposition 15.11 implies that 
Eqs. (|5.ip . (|5.2p hold except with probability 0{k~[^^^) = o{n~^). On this event Lemma lOl yields 
> n"^/^, extending our assumption to the next round. Accumulating these probabilities for all 
t < T now shows that ¥(kt > n^^^) = 1 — o{T/n) and in particular Tc > T w.h.p., as required. 

Proof of lemma. The proof proceeds by induction: Assuming that Ki > n^^^ for all i < t < T we 
wish to deduce that Kt+i > n^^^. 

Repeatedly applying Eq. (|5.2p and using the induction hypothesis we find that 

Ct+i > Co + E (^f - '^') > E (4^/") -8(t + l)(n^/^)-^/^ = E {el'h---^^"^ (5.3) 

since t <T = n°^^K Following this, we claim that the set / = {O < i < t : > 15 ^°fj°fJ°.;;" } has 
size at most (logn)io. Indeed, as x is monotone increasing for all x < e, every such i E I has 

bi 1 13 log log n 

^ , g log log ra N^ 15 log log log n , \-4|+o(l) ^ n \-Tn 

> 15— — = (logn) 15^ ^ ' > (logn) w , 

V log logn / 

where the last inequality holds for large n. Hence, if we had |/| > 2(logn)^/^'^ then it would follow 
from (j5.3p that Q^i > 2 — o(l), contradicting the assumption of the lemma for large enough n. 

Moreover, by the assumption that G [0, 1] we have | < (1 — £i/2) < 1 for all i. Together with 
the facts that Kj+i > (1 — £i/2)Ki — nf^^ for alH < t due to (|5.ip while Ki < n for all i we now get 

. A/1 /o^ 2/3 ^ 1 15 log log log nV 2/3 

Kt+i> Ko [[{1-8^/2) -}^K,/ > ll--. 2 I ln-(t + l)n/ 

1=0 i=0 ^ B B / 

> g-15'°f„gf„g°f,"T2-|7| ^ _ 2^^2/3 ^ 

where the last inequality used the fact that t < T as well as the inequality 1 — a;/2 > e~^, valid for 
all < X < 1. Now, 2~l-^l = n~°^^^ since |/| < 2(logn)^/^'^ and by the definition of T we obtain that 

for sufficiently large n, as claimed. This concludes the proof. ■ 

The remaining sections are devoted to the proof of Proposition 15.11 and are organized as follows. 
In ^5.11 we will relate Gt{^) to the expected change in Kt- While unfortunately there is no direct 
recursive relation for the sequence {Gt{^) : t > 0}, in §5.21 we will approximate E[Ft+i(Kts) | J^t] 
(closely related to Gt+i(s) = {l/Kt-\-i)Ft+i{Kt-\-is)) in terms of several evaluations of Gt- We will 
then refine our approximation of Gt+i{^) in ^5.3l bv examining Ft+i{s) at a point s E[iKt_|_i | !Ft]- 
Finally, these ingredients will be combined into the proof of Proposition 15.11 in ^5.41 
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5.1. Relating Gt(^) to the coalescence rate. The next lemma shows that the value of Gt{^) 
governs the expected number of merges in round t + 1. 

Lemma 5.3. Suppose that after t rounds we have > 2 clusters and set et = 1 — Gt{^) ■ Then 

|E[/tt+i| Ji]-(l-et/2)/tt|<i. (5.4) 

This emphasizes the importance of tracking the value of (^), as one could derive a lower bound 
on the coalescence time by showing that Gt{^) is sufficiently close to 1 (that is, £t is suitably small). 
In order to prove this lemma we first require two straightforward facts on the functions involved. 

Claim 5.4. The following holds for allt with probability 1. The function Gt{-) is convex, decreasing 
and 1-Lipschitz on the domain M^. Furthermore, Gt{s) > e^^ for any s. 

Proof. Denote the cluster-sizes at the end of round t by wi, . . . jVu^f Recall that by definition 
Gt{s) = {1/ Kt)Ft{nts) = (I/kj) e""'*'^'^ is an arithmetic mean of negative exponentials of s, 
hence convex and decreasing. Moreover, its first derivative is G[{s) = Fl{Kts) and in particular 

G;(0)=F/(0) = -5]u;, = -1. 

i 

Since G^(s) is increasing and negative we deduce that Gt is indeed 1-Lipschitz. Finally, since the 
negative exponential function is convex, Jensen's inequality concludes the proof by yielding 

I 

Claim 5.5. For any real numbers < x < 1 and k > we have (1 — x)'^ > e^'^^ — (en)^^ . 

Proof. Fix K > and consider the function f{x) = ^(e"'^^ — (1 — x)'^). The desired inequality 
is equivalent to having /(x) < 1/e for all < x < 1, hence it suffices to bound /(x) at all local 
maxima, then compare that bound to its values at the endpoints /(O) = and /(I) = K,e~^. 
It is easy to verify that any local extrema x* must satisfy (1 — x*)'^~^ = e~'^^ , and so 

/(x*) = Ke-'^^-*(l - (1 - X*)) = (Kx*)e-'^^* . 

Since ye~^ < 1/e for any y G M, both /(x*) and /(I) are at most 1/e, as required. ■ 

Proof of Lemma 15.31 Let n = Kt and = Kt+i, and as usual let wi, . . . ,Wk denote the cluster- 
sizes at the end of t rounds. Recalling the definition of the uniform coalescence process, the number 
of pairs of clusters that merge in round t -|- 1 is equal to the number of clusters which 

(a) select to be acceptors in this round, and 

(b) receive at least one incoming request in this round. 

(Compare this simple characterization with the number of merges in the size-biased process, where 
one must also consider the cluster-sizes of the incoming requests relative to the size of the acceptor.) 
A given cluster Ci becomes an acceptor with probability ^ , and conditioning on this event we are left 
with K — 1 other clusters, each of which may send a request to the cluster Cj with probability Wi/2 
(the factor of 2 accounts for the choice to issue rather than accept requests this round) independently 
of its peers. Altogether we conclude that the probability that Ci accepts an incoming request is 
exactly ^(l — (1 — and so the expected total number of merges is 

E[k - k' I J-,] = 1 ^ (l - (1 - w,./2r-') . 

i 
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Therefore, 



where the last inequahty is due to Gt being 1-Lipschitz as was estabhshed in Claim 15.41 For an 
upper bound on the expected number of merges we apply Claim 15.51 from which it follows that 

i ^ ' i 

Combining these bounds gives the required result. ■ 

5.2. Recursive approximation for Ft. Despite the fact that there is no direct recursion for the 
values of it turns out that on the level of expectation one can recover values of its counterpart 

FtJ^i from several different evaluations of Gt- Note that this still does not provide an estimate for 
the expected value of Gf+i, as the transformation between the -Fj+i and Gt+i unfortunately involves 
the number of clusters at time t + \, thereby introducing nonlinearity to the approximation. 

Lemma 5.6. Suppose that after t rounds Kt > 2 and let et = 1 — Gt{^). Then 



E[Ft+i{Kts) I Tt] > {l-et/2)Kt 

where 



a + P a + j3 



(5.5) 



a 



a{s, t) = Gt{s) + (i) , P = t) = l- Gt{s) . 



Remark. Although the approximation in Eq. (|5.5p may look intractable, its structure is in fact 
quite useful: The leading factor (1 — Et/2)Kt is essentially E[Kt+i | Tt\ from Lemma 15.3^ which is 
particularly convenient as we will need to divide by nt+i to pass from -Ff+i to Gt+i- 

Proof of lemma. As before, let k = Kt and denote the cluster-sizes by wi,. . . ,w^. We account for 
the change Ft+i{s) — Ft{s) as follows: Should the clusters Ci and Cj merge in round t + 1, this would 
contribute exactly e-^'"-+'"i> - e""'^^ - e""'^^ to Ft+i{s) - Ft{s). Thus, E[Ft+i{s) - Ft{s) \ Tt] is 
simply the sum of these expressions, weighted by the probabilities that the individual pairs merge. 

Let us calculate the probability that d accepts an incoming request from the cluster Cj . First let 
Ri denote the event that Cj accepts an incoming request from some cluster, which was shown in the 
proof of Lemma [531 to satisfy P(i?i | Tt) = ^{^ ~ ~ Wi/2)'^^^^ . Crucially, the fact that acceptors 
select an incoming request to merge with via a uniform law now implies that, given Ri, the identity 
of the cluster that Ci merges with is uniform over the remaining k — 1 clusters by symmetry. In 
particular, the probability that Cj accepts a merge request from Cj equals P(i?i | Ft)/{K,— 1) and so 

nFt+i{s) - Ft{s) \H = Y. - e-^^' - e""-^-) ^ (l " (1 " ^./2)''"') ^ . (5.6) 

The term (1 - Wi/2y^ is greater or equal to e-('^~^)"'»/2 - [e{K - 1)]-^ > e'^'^''/^ - [e(K - l)]-i 
by Claim [531 Since ^-i'^i+'^j)^ — ^-^i^^ _ g-'^j^ [g always negative by convexity, this gives 

E[Ft+i{s) - Ft{s) I -Ft] > V (e-(-^+-^> - e-^'' - e-^^') Ul- e'^^^l'' + -j^-^ ^ • 

^ \ ' 2 \ e{K — \) K—\ 

Next, observe that 

= _!+(!_ e-"'^^) (1 - e""'^^) > -1 , (5.7) 



e 



{Wi+Wj)s _ -WiS 
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hence we can sum the effect of the term l/(e(K — 1)) over all k{k — 1) indices i ^ j and get 



E[Ft+i{s) - Ftis) \ :Ft] > 



2{k 



1 



2e K-1) 



Note that the last expression has magnitude at most 1/e due to the assumption k > 2. Furthermore, 
each of the k{k — 1) summands in the summation over i ^ j has magnitude at most 1, hence 
we may replace the factor — 1) with I/k in front of the summation at a maximal cost of 



E[Ft+,is)-Ft{s)\:Ft]>^Y. 



> 



2k 

-E 

2k ^ 



-(uii+«)j)s 



-{Wi-\-Wj)s 



1 



-WiK/2 



-WiK/2 



(5.8) 



where the last inequality is due to each of the k diagonal terms i = j having magnitude at most 1. 
Since (j5.5p addresses -Ft+i(Ks) rather than Ft+i{s) we now focus on the following summation: 

-WiK/2 



-WihiS — WjKS 



1 



-Wi{KS+^)—WjKS 



-IViKS 



+ e 



-WjKS — WiK/2 



Ft (Ksf -Ft{KS + -)Ft{Ks)-KFt{Ks) + KFt{KS + 



KFtiKs)+Ft(-)FtiKs). 



Using this in (|5.8p . noting that the term —Ft^Ks) cancels out, we find that 

E[Ft+liKs) I -^t] > ^ \Ft{Ksf - Ftl^KS + '^^FtiKs) + KFtl^KS + + Ft(^^^Ft{Ks) 

K + Ft (q) \ FtiKs) + Ft (q) FtjKs) ^ K-Ft{Ks) Ft{KS + l) 



1 + G,(i) 



:Gt{s) + 



a 



:Gt {s + D 



2, 



a + /3 a + /3 

where a = Gt{s) + Gt[^) and /3 = 1 — Gt{s), thus establishing (jS.Sp . ■ 

5.3. Quantifying the convexity correction in the recursion for Ft. Examine the recursion 
established in Lemma 15.61 In order to derive lower bounds on the Ft^s, we recognize the second 
factor in the r.h.s. of (|5.5p as a weighted arithmetic mean of two evaluations of Gt- Recalling 
that Gt is a convex combination of negative exponentials, we will now estimate the "convexity 
correction" between Gt and its weighted mean. It is precisely this increment which will allow us to 
show that Gt rises toward 1 at a nontrivial rate, as the following lemma demonstrates: 

Lemma 5.7. Suppose after t rounds Kt > 2 and let et = 1 — Gt{^) and k* = {1 — et/2)Kt- Then 

E[Ft+i{K*) I Tt] > \Gt{l) + el^'H K*-2. (5.9) 



Indeed, by Lemma 15.31 we recognize that k* is approximately E[kj_|_i | J-^], hence, postponing for 
the moment concentration arguments, one sees that Eq. (|5.9p resembles the form of Eq. (|5.2p . Our 
first step in proving this lemma will be to establish a lower bound similar to (j5.9p which replaces 
the e^^* term by the convexity correction between Gt and its weighted mean from Eq. (15. 5p . 
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Claim 5.8. Suppose after t rounds Kt > 2 and let ej = 1 — Gt{^) and k* = [1 — et/2)Kt. Let h{s) 
be the secant line intersecting Gt{s) at Si = k* / Ht ctnd S2 = Si + ^. Let 9 = ^j^si + ^;^^S2 where 
a = Gtisi) + Gtil) and (5 = 1- Gj(si), and let A = h{e) - Gt{9). Then 

E[Ft+iK) I Tt] > [Gt{l) + A]k* - 2 (5.10) 

and in addition 

|<0-.i<i. (5.11) 

Proof. Applying Lemma 15.61 with s = si and rewriting its statement in terms of h,9,A give 

E[Ft+iiK*) I Ji] > (1 - et/2)Kth{0) - 2 = h{e)K* - 2 = [Gti9) + A]k* - 2 . 

Since we established in Claim [57^ that Gt is decreasing, (jS.lOp will follow from showing that 9 < 1. 
Note that ^ is a weighted mean between si = 1 — £t/2 and S2 = + ^, and so it is not immediate 
that 9 < 1. To show that this is the case, we argue as follows. 

Recalling the definition of 0, we wish to show that asi+/3(si+^) < a+/3 where a = Gt{si)+Gt{^) 
and (3 = 1 — Gt{si). Observe that a + /3 = 1 + Gt{^) = 2 — et = 2si by definition. Therefore, 9 < 1 
if and only if (a + /3)si + /3/2 < 2si, or equivalently 

2(Gt(i)-l) si + l<Gt{si). 

We claim that indeed 

2(Gi(i) - l) s + 1 < Gt(s) foranys>|, (5.12) 

which would in particular imply that it holds for s = si > since si = 1 — £t/2 with St < 1. In 
order to verify (j5.12p observe that its l.h.s. is an affine function of s whereas the r.h.s. is convex, 
and that equality holds for s = (recall that Gj(0) = 1) and s = ^. Thus, the affine l.h.s. does not 
exceed the convex r.h.s. for any s > ^, as required. We now conclude that 9 < 1, establishing (j5.10p . 

It remains to prove (|5.1ip . Since is a weighted arithmetic mean of si and S2 = si + |, the 
upper bound will follow once we show that the weight on si exceeds the weight on S2 + ^, that is, 
when Q > /3 or equivalently 

2Gt{si) + Gt{^) > 1. 

This indeed holds, as Claim [53] established that Gt{s) > e~^ and therefore the l.h.s. above is at 
least 2e"''i + e"^/^ > 2/e + e"^/^ > |, where we used the fact that si = 1 - et/2 < 1. 

For the lower bound in (15. lip , recall from Claim [53] that Gt is decreasing and Gj(0) = 1, which 
together with the aforementioned fact that si > ^ gives 

(3 _ I - Gtisi) ^ 1 - GtC^) _ St 



a + /3 l + Gti^) ~ 2 2 

The proof is now concluded by noting that 9 — si = ^ ■ -^^^ by definition. ■ 

Next, we will provide a lower bound on the convexity correction in terms of the difference between 
two evaluations of Gt- 

Claim 5.9. Let s < 1 and let h he the secant line intersecting Gt at s and s + For any < 5 < |, 

1-2 

h{s + 5)- Gt{s + 5)>- [Gt{\) - Gt{l)\ ' . 
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Proof. Let g denote the secant line intersecting Gt at s and s + 26. Since 6 < j and Gt is a 
decreasing convex function, 

Gt{s + 6) <g{s + 6) < h{s + 6). 
It thus suffices to show the following to deduce the statement of the claim: 



g{s + 6)-Gt{s + 6)>^- [Gi(i) - Gt{l)] ' , 



(5.13) 



which has a particularly convenient l.h.s. due to the fact that g{s + 6) = ^[Gt{s) + Gt{s + 26)] by 
definition. Now let k = Kt and let wi, . . . ,Wk be the cluster-sizes at the end of round t. We have 



i [Gt{s) + Gt{s + 26)] - Gtis + 6) = ^Y1 [^^"'"' " ^ 

i 

-E 

2k ^ 



g uiiK(s+(5) _|_ g--u)iK(s+2(5) 



-WiKS 



By Cauchy-Schwarz, the r.h.s. of (j5.14p satisfies 



1 - e 



-WiKS 



(5.14) 



1 



2k; ^ 



1-e 



— WiKS 



1 

> - 
- 2 



-WiKs/2 I 



1-e 



1 r 
2 



(5.15) 



Set = [1/25], noting that K<\/6as6<^. Since Gt is a decreasing convex function we have 

Gt (I) -Gt[^ + 6)> Gt (I + (j - 1)5) - (I + i^) for any j > 1 , 
and summing these equations for j = 1, . . . , X yields 



Gt 



1 

> — 
- K 



which is at least {l/K)[Gt{l) " Gt(l)] 

once again since s < 1 and Gt is convex and decreasing. 
Therefore, since X < 1/5 we can conclude that 

Gt - Gt + 6) > 6 [GtC^) - Gtil)] , 

which together with (j5.14p . (j5.15p now establishes ()5.13p and thus completes the proof. ■ 

The above claim quantified the convexity correction in terms of Gt{^) — Gt{l), and next we 
wish to estimate this quantity in terms of the key parameter et = 1 — Gj(^), which governs the 
coalescence rate as was established by Lemma 15.31 

Claim 5.10. For any t we have Gt(|) — Gt{l) > e^^^', where = 1 — Gt{^). 
Proof. We first claim that 



Gtis) - Gt(2s) < ^/GA2sy^G^ for any s > . 



(5.16) 
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Indeed, let k = Kt, let wi, ... , be the cluster-sizes after time t and define 



X = Gt{0) - Gt{s) = -y{l- e-^^'^') 



Y = Gt{s) - Gti2s) = i V e'^' (l - e"™''^^) , 

i 

Z = Gt{2s) - Gt{3s) = - V (l - e"'""'^") . 

K/ 

i 

By Cauchy-Schwarz, Y < \OCZ. Moreover, XZ < Z < Gt{2s) - Gt{As) since Gt is decreasing and 
Gt(0) = 1, and combining these inequahties now estabhshes ()5.16p . 

Let 7 = Gt{^) — Gt{l) and let r > 2. A repeated application of ()5.16p reveals that 

Gt (2-'=) - Gt {2-^'-'^) < 7^/2""^ for = 1, 2, . . . , r , 
and summing these equations we find that 

Gt (2-)-G,(i)<^y/2''^<r7^/^^"^ 

k=l 

On the other hand, since Gt is 1-Lipschitz we also have Gt{2~'^') > Gt{0) — 2~^' = 1 — 2~^ . 
At this point, recalling that et = I — Gt{^) and combining it with the above bounds gives 

et - 2-'- < Gt (2~^) - Gt (i) < n'/^'"' . (5.17) 

The above inequality is valid for any integer r > 2 and we now choose r = [log2(4/3et)] , or 
equivalently r is the least integer such that 2~^ < |ef. One should notice that indeed r > 2 since 
we have £t < ^, which in turn follows from the fact Gt{s) > e~'^ (see Claim [5^ yielding 

et<l-e-^'^ <l. (5.18) 

Revisiting Eq. ()5.17p and using the fact that 2"'' < |ej we find that < r7^/^'^ \ and after 
rearranging 7 > (ej/4r)^'^ ^ . Moreover, by definition r < log2(8/3ej) and as one can easily verify 
that 41og2(8/3x) < x""*^^/^ for all < x < | (which by (I5.18P covers the range of et), we have 
r < The choice of r further implies that 2^'^^ < 4/3et and combining these bounds gives 

as claimed. 



7>l4^) >[f. ) =et 



We are now ready to establish Eq. (j5.9p . the quantitative bound on the convexity correction in 
the weighted mean of Eq. (I5.5P . 

Proof of Lemma 15.71 By Claim [5^ in order to prove Eq. (I5.9p it suffices to show that A > 
with A as defined in the statement of that claim. Using Eq. (j5.1ip of Claim 15.81 we can write 
A = h{si + 5) — Gt{si + 5) where h is the secant line defined in that claim, si = 1 — £t/2 and 5 
satisfies < 4(5 < 1. Therefore, Claim [52] implies that A > i(et/4)2[Gf (i) - Gf(l)]2. Applying 
Claim 15.101 we find that 
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where we consolidated the constant factors into the exponent using the fact that a;^/32 > x^^^ for 
all < X < I while bearing in mind that by (I5.18P indeed < |. ■ 

5.4. Proof of Proposition 15.11 Let k = Kt and note that w.l.o.g. we may assume that k is 
sufficiently large by choosing the constant C from the statement of the proposition appropriately. 

Let wi,. . . denote the cluster-sizes. As argued before, given one can realize round t + 1 
of the process by a K-dimensional product space, where clusters behave independently as follows: 

(1) For each i, the cluster Ci decides whether to send or accept requests via a fair coin toss. 

(2) When sending a request Ci selects its recipient cluster randomly (proportionally to the Wj's). 

(3) When accepting requests Ci generates a random real number between and 1 to be used to 
select the incoming merge-request it will grant (uniformly over all the incoming requests). 

As such, conditioned on J't the variable Kt+i is clearly 1-Lipschitz w.r.t. the above product space 
since changing the value corresponding to the action of one cluster can affect at most one merge. 
Thus, by a standard well-known coupling argument (see e.g. [3]) the increments of the corresponding 
Doob martingale are bounded by 1 (i.e. |Mj_|_i — Mj| < 1 where Mj = E[/tt+i | J^^] with J^- being 
the (T-algebra generated by the actions of clusters 1, . . . ,i and Tt). Hoeffding's inequality now gives 



l^i+i — ]E[Kt+i I Tt]\ > a Ftj < 2exp(— a /2k) for any a > 0. 
Letting k* = (1 + et/2)K we recall from Lemma 15.31 that |E[Kf-|-i | Tt] — ^ \ and obtain that 

¥{\Kt+i-K*\ > ^2/3 I J-,) < 2exp (- 1(k2/3_ 1)2/^^ ^ 2 exp {-^k^/^+0{k-^/^)) < , (5.19) 

where the last inequality holds for any sufficiently large k, thus establishing (15. 



To obtain Eq. ([521), recall from ([SZD that -1 < e-('^'+"'i)* - e""'** - e'""^^ < 0, implying that 
the random variable Ft+i{K,*) is 1-Lipschitz w.r.t. the aforementioned K-dimensional product space. 
Furthermore, E[Fj+i(k*) | Ji] > [Gt(l)+ef /"*]k*-2 due to Lemma [5.7t and by the same argument 
as before we conclude from Hoeffding's inequality that 

.-100 



(Fi+i(K*) < [Gt{l) + ef'']K* - ^2/3 I J-,) < exp ( - l^^/^ + 0{k''/^)) < 



K 



Rewriting this inequality in terms of Gt+i, with probability at least 1 — K-ioo we have 

where we used that [Gt{l) + el^^'^'jin* - Kt+i) > -(Gt(0) + l)\Kt+i - k*\ = -2\Kt+i - k*\ due to 
Gt{s) being decreasing in s. Moreover, since Gt+i is 1-Lipschitz as was shown in Claim [5^ in this 
event we have 



Gt+i{l) > Gt-fi 



1 



3\Kt+l - K*| + ^2/3 



13/et 

* Kt+1 



Finally, recalling from Eq. (j5.19p that l^t+i — k*\ < k^^^ except with a probability of at most k 
we can conclude that with probability at least 1 — 2k~^^^ 

(* \ * 2/3 

^ - 1-^ >G,(l) + ef/^*-4^>Gt(l)+er/^*-8K-V3, 

where the last inequality used the fact that Kt+i > k/2 by definition of the coalescence process 
(since the merging pairs of clusters are always pairwise-disjoint). This yields (j5.2p and therefore 
completes the proof of the proposition. ■ 
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